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Abstract 

•  This  paper  examines  the  performance  of  Valiant’s  PRAM  algorithm  for  finding  the  max¬ 
imum  of  a  set  of  numbers,  assuming  that  a  modified  PRAM  model  is  used.  The  modified 
PRAM  is  like  a  standard  CRCW  PRAM,  except  that  multiple  read  or  write  requests  to  a 
single  memory  location  are  handled  sequentially.  It  is  shown  that  using  this  model.  Valiant’s 
algorithm  requires  0(sqrt(N))  time  to  find  the  maximum  of  N  numbers  using  N  processors, 
and  that  it  does  require  time  proportional  to  sqrt(N)  infinitely  often. 
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Valiant’s  Maximum  Algorithm  with  Sequential  Memory  Accesses 


Valiant  has  created  a  PRAM  algorithm  for  finding  the  maximum  of  N  numbers  that  uses 
P  =  N  processors  and  operates  in  0(loglog  P)  time  [1],  That  algorithm  consists  of  0(loglog 
P)  stages,  each  of  which  takes  constant  time.  At  the  start  of  each  stage,  a  certain  number  of 
candidates  for  the  maximum  are  present.  These  candidates  are  divided  into  roughly  equal¬ 
sized  groups,  and  the  maximum  item  from  each  group  is  calculated.  These  group  maxima 
form  the  set  of  candidates  for  the  following  stage.  Within  each  group,  all  possible  pairs  of 
items  are  compared,  with  one  processor  being  responsible  for  each  such  pair.  As  a  result,  a 

group  with  X  items  requires  that  f  ^  ^  processors  be  assigned  to  it.  The  group  sizes  are 


chosen  to  be  as  large  as  possible  given  the  limited  number  of  processors  that  are  available. 

Valiant’s  algorithm  requires  that  in  unit  time  a  large  number  of  processors  can  read  from 
or  write  to  a  single  memory  location.  This  paper  examines  the  performance  of  Valiant’s 
algorithm  when  this  assumption  is  changed.  In  particular,  it  will  be  assumed  that  multiple 
requests  to  a  single  memory  location  are  handled  sequentially.  This  analysis  arises  as  a  result 
of  investigations  by  L.  Snyder  [2]. 

Let  N’(t)  be  the  number  of  items  left  at  time  (stage)  t,  and  let  R(t)  be  the  number  of 
groups  that  the  N(t)  items  are  divided  into,  and  let  S(t)  be  the  size  of  the  largest  of  these 
groups.  For  instance,  assuming  P  processors  and  N  =  P  numbers  to  begin  with,  N(0)  = 
P,  R(0)  =  and  S(0)  =  3.  Because  each  group  contibutes  one  candidate  to  the  next 

stage,  R(t)  =  N(t-l-l)  for  all  values  of  t.  Also,  S(t)  =  [’N(t)/R(t)]  for  all  values  of  t. 

In  the  original  analysis  of  Valiant’s  algorithm,  each  stage  required  constant  time.  With 
the  current  assumptions  about  sequential  access  to  memory  locations,  stage  t  requires  0(S(t)) 
time  to  complete.  Thus,  the  time  for  the  entire  algorithm  depends  on  the  sum  of  the  0(loglog 
P)  values  of  S(t).  It  will  be  shown  that  if  P  processors  are  available,  and  N  =  P  numbers  are 
examined  for  the  maximum,  then  the  algorithm  operates  in  0(P'''^)  time.  In  fact,  the  sum 
of  the  S(t)’s  is  never  more  than  (2P)‘/^  +  o(P^^^)  and  it  is  greater  than  (2P)*^^  infinitely 
often. 

To  see  that  the  sum  of  the  S(t)’s  for  all  0(loglog  P)  values  of  t  is  never  more  than  2P*'^^ 
+  o(P*/’),  the  following  theorem  is  needed; 

Theorem:  For  all  values  of  t,  S(t)  <  (2P/R(t))^/^  +  2. 

Proof:  Because  S(t)  is  the  size  of  the  largest  group  at  time  t,  the  smallest  group  at  time 
t  has  at  least  S(t)  -  1  itenrjs  and  thus  each  group  needs  at  least  [  ^  ^  processors. 


Because  there  are  R(t)  groups,  ^  *^his,  it  follows  that  (S(t)  - 

l)(S(t)  -  2)  <  2P/R(t)  and  that  S(t)  <  (2P/R(t))^/2  ^  3. 

Note  that  as  a  corollary  to  the  theorem,  S{t)  <  (2P)^'^^  +  2  for  all  values  of  t  because 
R(t)>  1.  Now  consider  the  values  of  S(t)  for  all  0(loglog  P)  values  of  t.  Either  all  of  the 
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